Generalized Entropy and Decision Trees

نویسندگان

  • Dan A. Simovici
  • Szymon Jaroszewicz
چکیده

We introduce an extension of the notion of Shannon conditional entropy to a more general form of conditional entropy that captures both the conditional Shannon entropy and a similar notion related to the Gini index. The proposed family of conditional entropies generates a collection of metrics over the set of partitions of finite sets, which can be used to construct decision trees. Experimental results suggest that by varying the parameter that defines the entropy it is possible to obtain smaller decision trees for certain databases without sacrificing accurracy. RÉSUMÉ. Nous présentons une extension de la notion de l’entropie conditionnelle de Shannon à une forme plus générale d’entropie conditionnelle qui formalise l’entropie conditionnelle de Shannon et une notion semblable liée à l’index de Gini. La famille proposée d’entropies conditionnelles produit une collection de métriques sur l’ensemble de partitions des ensembles finis, qui peuvent être employées pour construire des arbres de décision. Les résultats expérimentaux suggèrent qu’en changeant le paramètre qui définit l’entropie il est possible d’obtenir de plus petits arbres de décision pour certaines bases de données sans sacrifier l’exactitude de la classification.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparison of Shannon, Renyi and Tsallis Entropy Used in Decision Trees

Shannon entropy used in standard top-down decision trees does not guarantee the best generalization. Split criteria based on generalized entropies offer different compromise between purity of nodes and overall information gain. Modified C4.5 decision trees based on Tsallis and Renyi entropies have been tested on several high-dimensional microarray datasets with interesting results. This approac...

متن کامل

Unifying Decision Trees Split Criteria Using Tsallis Entropy

The construction of efficient and effective decision trees remains a key topic in machine learning because of their simplicity and flexibility. A lot of heuristic algorithms have been proposed to construct near-optimal decision trees. Most of them, however, are greedy algorithms which have the drawback of obtaining only local optimums. Besides, common split criteria, e.g. Shannon entropy, Gain ...

متن کامل

Application of Different Methods of Decision Tree Algorithm for Mapping Rangeland Using Satellite Imagery (Case Study: Doviraj Catchment in Ilam Province)

Using satellite imagery for the study of Earth's resources is attended by manyresearchers. In fact, the various phenomena have different spectral response inelectromagnetic radiation. One major application of satellite data is the classification ofland cover. In recent years, a number of classification algorithms have been developed forclassification of remote sensing data. One of the most nota...

متن کامل

Generalized Entropy for Splitting on Numerical Attributes in Decision Trees

Decision Trees are well known for their training efficiency and their interpretable knowledge representation. They apply a greedy search and a divide-and-conquer approach to learn patterns. The greedy search is based on the evaluation criterion on the candidate splits at each node. Although research has been performed on various such criteria, there is no significant improvement from the classi...

متن کامل

An asymmetric entropy measure for decision trees

In this paper we present a new entropy measure to grow decision trees. This measure has the characteristic to be asymmetric, allowing the user to grow trees which better correspond to his expectation in terms of recall and precision on each class. Then we propose decision rules adapted to such trees. Experiments have been realized on real medical data from breast cancer screening units.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002